Sort by
DataSifter II: Partially synthetic data sharing of sensitive information containing time-varying correlated observations.

There is a significant public demand for rapid data-driven scientific investigations using aggregated sensitive information. However, many technical challenges and regulatory policies hinder efficient data sharing. In this study, we describe a partially synthetic data generation technique for creating anonymized data archives whose joint distributions closely resemble those of the original (sensitive) data. Specifically, we introduce the DataSifter technique for time-varying correlated data (DataSifter II), which relies on an iterative model-based imputation using generalized linear mixed model and random effects-expectation maximization tree. DataSifter II can be used to generate synthetic repeated measures data for testing and validating new analytical techniques. Compared to the multiple imputation method, DataSifter II application on simulated and real clinical data demonstrates that the new method provides extensive reduction of re-identification risk (data privacy) while preserving the analytical value (data utility) in the obfuscated data. The performance of the DataSifter II on a simulation involving 20% artificially missingness in the data, shows at least 80% reduction of the disclosure risk, compared to the multiple imputation method, without a substantial impact on the data analytical value. In a separate clinical data (Medical Information Mart for Intensive Care III) validation, a model-based statistical inference drawn from the original data agrees with an analogous analytical inference obtained using the DataSifter II obfuscated (sifted) data. For large time-varying datasets containing sensitive information, the proposed technique provides an automated tool for alleviating the barriers of data sharing and facilitating effective, advanced, and collaborative analytics.

Open Access
Relevant
An Adaptive De-Noising Method via the Lifting Scheme

The wavelet transform de-noising method based on the threshold shrinkage is widely used, however, the classical threshold shrinkage method maybe exists constant deviation or appears additional concussion after reconfiguration. If the threshold value is chosen too large, part of useful signal points will be eliminated; if chosen too small, part of the noise will be kept. In addition, the “first generation wavelet” de-noising method only can choose one type of wavelet base, and can't be changed during the de-noising process. Due to the different wavelet has its own characteristics and applicable signal, It can achieve local optimum then overall optimization that choosing wavelet in a small wave focus according to the partial feature of the signal, namely using the multiwavelets to comprehensively decompose signal. This paper puts forward a new threshold function, then improves the selection scheme of threshold value, achieves that it can select dynamically according to the wavelet decomposition levels; Finally, on the basis of the lifting theory, this paper puts forward an adaptive method, which can select wavelet dynamically according to the partial feature of the signal, comprehensively making use of the advantages of each wavelet, to achieve better de-noising effect and a certain degree of adaptability. Experiments show the effectiveness of our optimization.

Open Access
Relevant